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ABSTRACT: An emerging trend toward computer-mediated collaborative learning environments 
promotes lively exchanges between learners in order to facilitate learning. Discourse can play an 
important role in enhancing epistemology, pedagogy, and assessments in these environments. In 
this paper, we highlight some of our recent work showing the advantages using theoretically 
grounded automated linguistics tools to identify pedagogically valuable discourse features that 
can be applied in collaborative learning, intelligent tutoring systems (ITS), computer-mediated 
collaborative learning (CMCL), and MOOC environments. 
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1. INTRODUCTION 


Current educational practices suggest an emerging trend toward computer-mediated collaborative 
learning environments and groupware tools, such as email, chat, threaded discussion, massive open 
online courses (MOOCs), and trialog-based intelligent tutoring systems (ITSs). This has stimulated recent 
discussion among educational data mining and learning analytics researchers about how best to model 
learners' cognitive, motivational, affective, and social processes and incorporate pedagogically 
beneficial, adaptive strategies into these environments. In this paper, we highlight some of the recent 
work showing the advantages of using theoretically grounded automated linguistics tools to identify 
pedagogically valuable discourse features that can be applied in collaborative learning, ITS, CMCL, and 
MOOC environments. 

1.1. Theoretical framework 

Collaborative language is the factor that sets CMCL learning apart from individual learning (Dowell, 
Cade, Tausczik, Pennebaker, & Graesser, 2014). In this context, language, discourse and communication 
play a critical and complex role that can provide insight regarding social processes (i.e., establishing a 
common ground and vision), individual and group cognitive processes (i.e., knowledge construction), 
and affective processes (i.e., confusion, frustration, boredom, flow/engagement). Psychological 
frameworks of comprehension and learning have identified the representations, structures, strategies, 
and processes at multiple levels of discourse (Graesser & McNamara, 2011; Kintsch, 1998; Snow, 2002). 
Five levels have frequently been proposed in these frameworks: 1) words, 2) syntax, 3) the explicit 
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textbase, 4) the situation model (sometimes called the mental model), and 5) the discourse genre and 
rhetorical structure (the type of discourse and its composition). In the educational context, learners can 
experience communication misalignments and comprehension breakdowns at different levels. These 
breakdowns and misalignments have important implications for cognitive processing. 


2. METHODS 


2.1. Computational Linguistic Analysis Tool 

Coh-Metrix is a computational linguistics facility that analyzes higher-level features of language and 
discourse (Graesser, McNamara, & Kulikowich, 2011; McNamara, Graesser, McCarthy, & Cai, 2014). Coh- 
Metrix includes sophisticated methods of natural language processing, providing over 100 measures at 
multiple levels, including genre, cohesion, syntax, words, as well as other characteristics of language and 
discourse. Coh-Metrix also offers measures of linguistic complexity and formality, characteristics of 
words, and readability scores. There was a need to reduce the large number of measures provided by 
Coh-Metrix into a more manageable number of measures. This was achieved in a study that examined 
53 Coh-Metrix measures for 37,520 texts in the TASA (Touchstone Applied Science Association) corpus, 
which represents what typical high school students have read throughout their lives (Graesser et al., 
2011). A principal components analysis was conducted on the corpus, yielding eight components that 
explained an impressive 67.3% of the variability among texts; the top five components explained over 
50% of the variance. Importantly, the components aligned with the discourse levels previously proposed 
in multilevel theoretical frameworks of cognition and comprehension (Graesser & McNamara, 2011; 
Kintsch, 1998; Snow, 2002) and thus are ideal for investigating trends in learning-oriented interactions. 
The five major dimensions are succinctly defined below, starting with the most global level (genre): 

• Narrativity: The extent to which the text is in the narrative genre, which conveys a story, a 
procedure, or a sequence of episodes of actions and events with animate beings. Informational 
texts on unfamiliar topics are at the opposite end of the continuum. 

• Deep Cohesion: The extent to which the ideas in the text are cohesively connected at a deeper 
conceptual level that signifies causality or intentionality. 

• Referential Cohesion: The extent to which explicit words and ideas in the text are connected 
with each other as the text unfolds. 

• Syntactic Simplicity: Sentences with few words and simple, familiar syntactic structures. At the 
opposite pole are structurally embedded sentences that require the reader to hold many words 
and ideas in working memory. 

• Word Concreteness: The extent to which content words are concrete, meaningful, and evoke 
mental images as opposed to abstract words. 
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3. RECENT FINDINGS 

Our recent work used Coh-Metrix to explore cognitive, affective, social, and socio-affective processes 
during collaborative learning, ITS, and MOOC interactions (Cade, Dowell, Graesser, Tausczik, & 
Pennebaker, 2014; D'Mello, Dowell, & Graesser, 2009; Dowell et al., 2014; Joksimovic et al., under 
review). For instance, Dowell et al. (2014) explored the possibility of using discourse features to predict 
student and group performance during collaborative learning interactions. We investigated the linguistic 
patterns of group chats, within an online collaborative learning exercise, on five discourse dimensions 
using Coh-Metrix. Our results indicated that students who engaged in deeper cohesive integration and 
generated more complicated syntactic structures performed significantly better. The overall group level 
results indicated collaborative groups who engaged in deeper cohesive and expository style interactions 
performed significantly better on post-tests. In line with this, Cade et al. (2014) demonstrated that 
cognitive linguistic cues can be used in detecting students' socio-affective attitudes towards fellow 
students in CMCL environments, which may have long-term consequences for their motivation and 
continued use of such systems. Our current and future research focuses on exploring how these 
strategies transfer to increasingly larger, more culturally diverse populations of learners and extending 
our conclusions to practical applications that enhance learning and teaching. 

4. CONTRIBUTION TO LEARNING ANALTYICS 

These results suggest that students' latent cognitive, affective, and social processes can be monitored by 
analyzing language and discourse. An interdisciplinary approach that combines psychological theories of 
discourse comprehension with computational linguistics methodologies holds the potential for enabling 
substantially improved learning environments by providing real-time detection of students and group 
performance and by using this information to develop student models and provide adaptive learning 
supports. 
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